ROptimus: a parallel general-purpose adaptive optimization engine

Summary Motivation Various computational biology calculations require a probabilistic optimization protocol to determine the parameters that capture the system at a desired state in the configurational space. Many existing methods excel at certain scenarios, but fail in others due, in part, to an inefficient exploration of the parameter space and easy trapping into local minima. Here, we developed a general-purpose optimization engine in R that can be plugged to any, simple or complex, modelling initiative through a few lucid interfacing functions, to perform a seamless optimization with rigorous parameter sampling. Results ROptimus features simulated annealing and replica exchange implementations equipped with adaptive thermoregulation to drive Monte Carlo optimization process in a flexible manner, through constrained acceptance frequency but unconstrained adaptive pseudo temperature regimens. We exemplify the applicability of our R optimizer to a diverse set of problems spanning data analyses and computational biology tasks. Availability and implementation ROptimus is written and implemented in R, and is freely available from CRAN (http://cran.r-project.org/web/packages/ROptimus/index.html) and GitHub (http://github.com/SahakyanLab/ROptimus).


Introduction Motivation
For a complex model, where the unknown parameters cannot be determined by conventional linear or non-linear fitting techniques, methods for optimisation based on biased random sampling of the parameter space, are the methods of choice (Cowles and Carlin 1996). In such cases, the quality of the solution found, following some optimisation protocol, depends on that protocol's ability to effectively explore the parameter space (Gilks, Richardson, and Spiegelhalter 1996) and be drawn to more favourable areas therein. Many existing methods perform well for certain modelling scenarios, but fail in others due, in part, to an inefficient exploration of the parameter space and easy trapping into local minima with regard to the metrics for the quality of the found solutions. In this manual, we present ROptimus, a universal Monte Carlo optimisation engine in R with acceptance ratio annealing, replica exchange and adaptive thermoregulation. It can universally interface with any modelling initiative through its interfacing functions, and optimise the model parameters by effectively exploring the parameter space. Controlled by a user, ROptimus can execute either an annealing or a replica exchange procedure, however, both driven by acceptance ratio adjustment, rather than by altering pseudo temperature. This manual will begin with a basic overview of Monte Carlo optimisation and the common temperature-based simulated annealing framework. It will then proceed with a presentation of the acceptance ratio annealing procedure of ROptimus, its adaptive thermoregulation feature, and its replica exchange procedure. After an explanation of how to download the ROptimus R package from GitHub and install it locally, five stand-alone tutorials will be presented to illustrate how users should employ ROptimus and to demonstrate its flexibility as an optimisation engine across a wide variety of optimisation scenarios. Finally, an "Advanced User Manual" section is included, in which all possible and tested input parameters to ROptimus are outlined and the output formats are detailed.

Briefly on Monte Carlo and Temperature-Driven Simulated Annealing Procedure
Let us assume that our model is a certain function m() that performs operations on the inputted K coefficient set and returns the observable object O = m(K). Our task is to optimise K, the set of coefficients, so that the error metric u(O) that measures the violations from the target O trg set by the model-generated O is minimal. In a Monte Carlo optimisation procedure, we can define a pseudo-energy e(u) of the system as a function of u(), where lower values of the pseudo energy e correspond to better candidate solutions K. In order to find a better set of K, we need to alter it by a certain rule r() that, if repeated many times, enables the sampling of the parameter space for K. One can then evaluate the pseudo energies before and after the alterations: We then accept or reject the alteration move, meaning we accept the new set of r(K) coefficients as the new K or revert back to the previous K, guided by the following acceptance probability, as postulated in a Metropolis criterion (Chib and Greenberg 1995), (Chen and Roux 2015): where T is the pseudo temperature, that should be always greater than 0. For a given ∆E > 0 energy difference, one would have a certain stringency for accepting the alteration move, depending on the value of the pseudo temperature T . Therefore, if the Monte Carlo simulation was to modify the K set to a state where any further move would increase the pseudo energy at a great enough amount for the moves to be almost always rejected, then one could overcome that state and further sample the other values in the parameter space by increasing the value of the pseudo temperature.
To this end, one way to overcome the barriers is by using the technique known as simulated annealing, where we gradually anneal the temperature from a higher value to lower during the course of the simulation (Kirkpatrick 1984). The parts of the simulation where the pseudo temperature is higher, allows relatively unconstrained exploration of the parameter search space, whereas those parts with a lower pseudo temperature limit the search to a more local area of the energy landscape and associated parameter space. Multiple cycles of this annealing procedure can thus be executed to increase the overall sampling.

Acceptance Ratio Simulated Annealing Procedure
A significant limitation of the pseudo-temperature-based simulated annealing, while used only in an optimisation objectives, is that a given scheme of temperature annealing might be efficient for some models or pseudo-energy metrics, but not efficient for others (Ingber 1993). A temperature for a given system at a given point of the annealing cycle that is designated to be quite permissive in terms of accepting the moves, can actually not be permissive for some other models or systems simulated. This depends on the value and scale of the ∆E energy difference, as can be seen in the equation for p accept , and can be affected by any of the e(), u(), m() components, and the system configuration K. This necessitates a pre-adjustment of the pseudo-temperature values from one modelling objective to another. Furthermore, even within a single model optimisation procedure on a defined system, the pseudo-energy metric can shift into a scale (due to a significant alteration/shift in the system as per the alteration rule r()) that does not match with the selected temperature scheme anymore, leading to a poor sampling of the parameter space when optimisation is at stake within restricted time and computational capacity. This can often be the case when the pseudo energy of the system does not exhibit a smooth dependency on K, loosely meaning that close states of K do not necessarily produce close pseudo-energy values (for instance, due to an alteration rule that randomly expands or trims the system, such as when equation terms are randomly turned on or off).
To this end, in general cases where A) we do not deal with energies and temperatures that display the smoothness or roughness emulating real physical systems, such as while simulating molecular systems; B) we are only interested in final optimised configuration of K, and do not need to characterise the statistical populance as an ensemble of reachable states/solutions, we can anneal a metric for crossing different barriers that is more transferable to different systems and system configurations. As such a metric, ROptimus uses the acceptance ratio, expressed as a fraction of accepted moves within certain number of past steps (STATWINDOW).
In a given annealing cycle, ROptimus constructs a linear target acceptance ratio regiment for each step. This is done based on an initial target acceptance ratio, a final acceptance ratio, and the number of iterations in each cycle for a given optimisation run (all of which can be specified as inputs). Once the optimisation process begins, ROptimus calculates an observed acceptance ratio at the end of each STATWINDOW (a fixed number of steps which can be specified by the user) by calculating the fraction of the accepted moves from all the past trials in the current STATWINDOW. Thereafter, ROptimus compares the observed acceptance ratio with the pre-defined target acceptance ratio based on the annealing schedule and determines whether and how to alter the system pseudo-temperature (adaptive thermoregulation) to align the observed acceptance ratio with the target ratio at the end of the following STATWINDOW. Thus, by employing acceptance ratio annealing and adaptive thermoregulation, ROptimus is able to methodically explore the parameter space for K even when no smooth relationship exists between K and the system pseudo energy.

Adaptive Thermoregulation
All decisions governing the adaptive pseudo-temperature alterations on the system are made by a Temperature Control Unit (TCU) in ROptimus. Note, that the TCU is completely encapsulated as a backend unit, such that modifications can be easily made by advanced users if needed. This section articulates the exact protocol followed by the default state of the TCU.
The initial system temperature is specified as an input argument. At the end of each STATWINDOW, if the observed acceptance ratio is within a fixed range T.DELTA (specified as an input argument) of the target acceptance ratio based on the annealing schedule, the TCU will make no change to the current system pseudo temperature. If the observed acceptance ratio is less than the ideal ratio and outside the range of T.DELTA, the TCU will increase the system pseudo temperature by a value T.ADJSTEP (the initial value of T.ADJSTEP is specified as an input argument). Similarly, if the observed acceptance ratio is greater than the ideal ratio and outside the range of T.DELTA, the TCU will reduce the temperature by a value T.ADJSTEP.
If the observed acceptance ratio keeps being below the ideal acceptance ratio for TSCLnum (an integer input argument) number of subsequent STATWINDOWs, T.ADJSTEP will be increased by a factor T.SCALING (an input argument). Similarly, T.ADJSTEP will also be decreased by a factor T.SCALING if the observed acceptance ratio is greater than the ideal acceptance ratio for TSCLnum subsequent STATWINDOWs. T.ADJSTEP is reset to its original input value whenever a series of subsequent observed acceptance ratios being greater than/less than ideal acceptance ratios is broken. If ever the TCU subtracts T.ADJSTEP from the current temperature and the result is a negative value, the system pseudo temperature is set to T.MIN (an input argument). The final feature of the TCU is that although the initial system pseudo temperature is specified by the user, if multiple annealing cycles are employed, the initial pseudo temperature for acceptance ratio annealing cycles, after the first cycle, is inferred from the decisions of the TCU on previous cycles.
The above collection of TCU decision rules with their default values result in pseudo-temperature modulations that assure the observed acceptance ratios during ROptimus runs to follow the ideal acceptance ratios remarkably well. Moreover, as will be highlighted in the five tutorials below, large temperature alterations are often required to align the observed acceptance ratios with the ideal ratios, a task which ROptimus excels at, whereas other protocols would have difficulties and result in poor parameter sampling because of fixed or gradually changing temperature regiments.

Acceptance Ratio Replica Exchange Procedure
ROptimus additionally supports acceptance ratio replica exchange as an optimisation mode, which can be selected in place of acceptance ratio annealing, provided that the user has access to multiple processors (ideally at least 4, and preferably 8 or more). The inspiration for this additional mode was taken from the parallel tempering Monte Carlo techniques and Replica Exchange Molecular Dynamics (REMD) simulations (Sugita and Okamoto 1999). In particular, REMD simulation is a technique employed to obtain equilibrium sampling of a molecular system (for instance, a protein) at a certain fixed (usually the lowest in a range) temperature. Let T = {T 1 , T 2 , ..., T n } be a set of n distinct temperatures for which T 1 < T 2 < ... < T n . In REMD, n replicas of molecular dynamics simulations for a given system are initialised at each T i ∈ T . Note, that each temperature T i corresponds to a slightly different potential of the examined system to roam the energy landscape and cross the barriers. States possible to populate at a temperature T i can become a bit more accessible at a temperature T i+1 and so on. If the state configurations in adjacent replicas are allowed to exchange, the simulation will be able to overcome energy barriers at various temperature replicas and thoroughly explore the parameter space. Moreover, for configuration x n in replica T i , and configuration x m in replica T i+1 , equilibrium sampling is achieved by selecting appropriate exchange probabilities (Sugita and Okamoto 1999), generally expressed as: Thus, overall, replica exchange simulations can be executed by simulating n replicas of a simulation at distinct temperatures simultaneously and independently, next randomly selecting two configurations in adjacent replicas and exchanging them with probability p re .

Comparison with Related R Packages
ROptimus strengths are at being a general-purpose optimisation engine that can be applied to a diverse set of problems through the flexibility and modularity of its interfacing functions and the capability to extensively search the parameter space by using a transferable process that uses acceptance ratio-based adaptive thermoregulation, in lieu of the situational tuning of the conventional, more system-and scaledependent pseudo temperature, and by offering an all-purpose replica exchange option borrowed from parallel tempering often implemented in molecular dynamics simulations and other Monte Carlo methods.
CRAN Task View: Optimization and Mathematical Programming (Version: 2022-04-12) (Schwendinger and Borchers 2022) lists currently available (non-archived), general-purpose and tailored R packages for dealing with optimisation problems. Categories include general-purpose continuous, quadratic programming, least-squares, semidefinite and convex, global and stochastic, mathematical programming, multi-objective and combinatorial solvers, and those for highly specific applications. It also lists optimisation infrastructure packages, interfaces to open source and commercial solvers, and libraries for benchmarking optimisation algorithms. Among the packages indicated, the ones with simulated annealing (SA) capabilities are NMOF (Gilli, Maringer, and Schumann 2019), (Schumann 2022), dclone (Solymos 2010), GenSA (Xiang et al. 2013), qap (Hahsler 2017) and stats (R Core Team 2021). Not present in the list but also implement SA are optimization (Husmann, Lange, and Spiegel 2017), likelihood (Murphy 2022), pomp (King, Nguyen, and Ionides 2016), (King et al. 2022) and subselect (Orestes Cerdeira et al. 2020) packages. Unlike ROptimus, however, the above packages employ the conventional pseudo temperature-based annealing procedure, which subjects them to problems detailed in previous sections e.g. the current temperature scale being suddenly irrelevant due to huge alterations in the system. Since their temperature control (e.g. monotonic, geometric and logarithmic cooling) is not adaptive, users have to pay extra attention to input arguments controlling the thermoregulation still with no assurance of effective parameter space sampling especially for complex problems with many local optimum traps. Finally, the aforementioned packages, except NMOF, expect limited or a specific format of a problem (e.g. fitting coefficients, maximum likelihood estimation) and therefore, expect specific data types for inputs (e.g. numeric vectors or lists). Consequently, the alteration rule for generating new candidate solutions, sometimes hard-coded within the function, is constrained making it difficult to repurpose these packages. In contrast, a key strength of ROptimus is its modularity and flexibility, allowing users to design any model, objective/cost and alteration functions that can be made to handle any R objects.

How to Cite
The usage of ROptimus should be accompanied with the following citations:

Installation Instructions
Installing ROptimus locally for immediate use requires only R and a connection to the Internet. It is available both on CRAN (stable release) and on GitHub (latest release). To install the latter version, open an R client and execute the commands below. Note, that the latest version of Rtools is required for this installation to work. If it is not available locally and the installation is attempted from within RStudio, a prompt will appear to download and install Rtools. After following those instructions, restart the RStudio session before reattempting the ROptimus installation.

Execution Instructions
ROptimus optimisation engine works seamlessly and requires little to no intervention from the user while applied to a wide variety of optimisation tasks. The part where the user must intervene is the specification of K, and the R functions corresponding to r(), m(), and the combined e • u(). Those objects serve as the application interface of the ROptimus engine to any optimisation task. The port through which additional user data or objects can be supplied to ROptimus, in case those are needed in the specification of a model, is implemented through the DATA argument that is passed to the model function m() and u(). All the interfacing functions and objects have a minimal and well defined internal compliance rules in ROptimus, and otherwise can be flexible and as simple or complicated as the user requires, to properly plug ROptimus into a desired modelling objective. The whole optimisation procedure with acceptance ratio annealing or replica exchange is then fully taken care of by the ROptimus engine.
The five tutorials below showcase the usage of ROptimus in five, broadly different scenarios, with a particular focus on how to write the interfacing functions to plug the ROptimus engine, and how the acceptance ratio annealing mode performs in comparison to the acceptance ratio replica exchange one. The examples also cover a usage scenario where an external, more complicated, program is linked to the R procedure. Furthermore, we provide a built in function OptimusExamples(), which can generate the necessary inputs for any of the mentioned five tutorials, ready for modifications to tailor those to particular needs. The OptimusExamples() function requires, at minimum, the tutorial identifier (1 to 5) along with the methodology specification (simulated annealing -SA; replica exchange -RE), in order to generate the interfacing ROptimus input, which by default is saved as an example.R file within the current directory (all alterable). The function can also save the vignette corresponding to the requested example, for further information about the example, if needed for further alterations.
The recommended way to use ROptimus would thus be by exploring and understanding the below tutorials, finding the example closest to one's objective, generating the input for that example using OptimusExamples(): OptimusExamples(example=2, method="SA") then modifying the input objects to your specific needs and workflow.

Tutorial 1: Finding Coefficients for a Polynomial Function Problem Statement
In this example, we shall use ROptimus to find the coefficients of the polynomial function that is known to represent the observations y the best. This, of course, is a simple task that can be addressed more robustly by least-squares linear model fitting. However, by starting with this example, we shall focus our attention on the organisation of the ROptimus input, rather than the complexity of the task.
First of all, let us create some data for the example.

Defining ROptimus Inputs
Now we can set up the inputs for the ROptimus run. We shall use the model k 1 x + k 2 x 2 + k 3 x 3 + k 4 x 4 to fit the y observables based on the values for x. The dependent functions that are needed for setting up an ROptimus run, are given as inputs in the Optimus() call.
First, we need to create the essential object K, which stores the initial values for the parameter(s) to be optimised. K can be an object of any type. From a single numeric or character value to a vector of values or a data frame holding, say, Cartesian coordinates of a molecule to be optimised. The only ROptimus requirement from K is that it should be something alterable (via a rule function r(), see below) and should influence the outcome of another required model functionm() (see below). In this example, we have 4 coefficients to optimise from some random initial state. We can thus make K be a numeric vector of size 4. Let us start from all the components being 1.0, which, as entries in K, can be both named and unnamed. Though not the case here, the entry-named data for K can be essential for some models that specifically use coefficient names, for instance when a system of ODEs is used in the model function m() in one of the tutorials.
K <-c(k1=1.0, k2=1.0, k3=1.0, k4=1.0) # entries are named as k1, k2, k3 and k4 Second, we should create the function m() for the model. The function m() should be designed to operate on the whole set of parameter snapshot K and return the corresponding observable object O. Please note that the size and shape of K and O are not necessarily to match, depending on the nature of the model used.
Operating on K is one of the hard conditions on m(), which can optionally operate on other data as well.
In our situation, the function m() should operate on the provided instance of four coefficients (in the object K), and, additionally on the values x. It should then return a vector of observations O (to be compared with y target observations) of the same size as vector x. Any additional data required by the model, in our case an object with the set of 1000 x values, must be provided to the function in an input variable DATA, a list holding the additional data that must be accessed by m() and u() (see below). The variable DATA must be provided to Optimus(), and m() must take it as an input. In the case that neither m() nor u() require additional data, the two functions should still be created such that they take a variable DATA as an input, and the variable DATA passed to Optimus() will be set to NULL).
# Generating an object that is then to be passed to DATA argument of Optimus(). # No need to call it DATA, as soon as it is passed to the DATA argument of Optimus().
At this point, calling m(K=K, DATA=DATA) from within Optimus() will return the predicted O set from the initial, non-optimal values for K, hence rather far from the target O trg = y.
In this example, the optimisation goal is for the O model outcomes to come as close as possible to the target observations y, to be achieved by optimising the coefficients K. The object y holding the target values therefore also needs to be specified and given as an input to the main Optimus() function (as a constituent entry in the DATA argument), just like x was supplied, as required, in this example, by the function m(). Now, we need to define how the performance of a given snapshot of coefficients K is to be evaluated. For ROptimus, this is done by specifying a function u(), which should necessarily take as inputs O (the output of m()) and the variable DATA. The output should have two components, Q holding a single number of the quality of the K coefficients, and E holding a (pseudo) energy for the given snapshot K. It is important that the returned (pseudo) energy value must be lower for better performance/version of K, never vice versa. The Q component of the u() function output is only used for plotting the optimisation process, and, if desired, can just repeat the value of the E component.
For our example, the u() function will assess the agreement between the snapshot of predictions O and the complete set of real observables (target) y. Here, we can use RMSD between O and y as a measure of K snapshot quality (Q). Since better agreement means better RMSD, it can be directly used as a pseudo energy (E), without putting a negative sign or performing some other mathematical operation on Q. RESULT <-NULL RESULT$Q <-Q RESULT$E <-E return(RESULT) } And finally, we need to define the rule, by which the K coefficient vector is to be altered from one step to another. This is done by defining a rule function r() that must take K, and return an object analogous to K, but with some alteration(s). In this example, for each snapshot of K, we shall randomly select one of its four coefficients, then either increment or decrement (chosen randomly) it by 0.0005, returning the altered set of coefficients.

return(K.new) }
All the constructed objects (K) and functions (m, u, r), as well as the data required by m() and u() (stored in the variable to be passed to DATA) should be defined in an R session and given to Optimus() as inputs. The users are free to define some dependencies as additional files (for example: initial protein geometry for a Monte-Carlo optimisation), which should be called from within the function definitions.

Acceptance Ratio Simulated Annealing ROptimus Run
Having constructed K, dependent data for DATA argument of Optimus(), m(), u() and r(), we are now ready to call Optimus(). Let us first investigate the Acceptance Ratio Annealing (SA) version of ROptimus on 4 CPUs (the vast majority of personal computers currently have at least 4 CPUs), which can be executed as follows: Optimus(NCPU=4, OPTNAME="poly_4_SA", LONG=FALSE, OPT.TYPE="SA", K.INITIAL=K, rDEF=r, mDEF=m, uDEF=u, DATA=DATA) Note that the field LONG=FALSE is included in the function call so that all data from the optimsation process is saved. Calling Optimus() with LONG=TRUE will result in a memory saving optimisation process (more details in the Advanced Usage section in this document). Of the 4 optimisation replicas, the second and fourth CPUs found the best parameter configuration (lowest RMSD) in our trial:

Acceptance Ratio Simulated Annealing ROptimus Fitting (4 Cores)
x y Notice that although the RMSD of this solution, 28.841, is greater than the RMSD of the least squares solution, 28.827, it is less than the RMSD of the de-noised data found above, 28.859.
The two graphs below illustrate i) the evolution of the system pseudo temperature, in response to alterations made by the Temperature Control Unit (TCU), as a function of the optimsation step; and ii) the observed acceptance ratio as a function of the optimisation step, respectively. The graphs show data from the last 20 000 steps of the optimisation executed by CPU 2.

Observed acceptance ratio evolution (CPU 2)
Step Acceptance ratios (%) In the first plot, the solid red line tracks the observed acceptance ratios calculated by ROptimus at the end of each STATWINDOW and the dashed black line tracks the target acceptance ratio based on the annealing schedule. From the above two graphs, notice that while the observed acceptance ratio tracks the target acceptance ratio closely, the system pseudo temperature changes significantly and non-monotonically. This illustrates that the adaptive thermoregulation allows ROptimus to effectively anneal the system acceptance ratio.

Acceptance Ratio Replica Exchange ROptimus Run
Let us now consider the Replica Exchange version of ROptimus on 12 CPUs. The purpose here is to illustrate how to run an optimisation using the Replica Exchange version of ROptimus; this method is of course an overkill for solving this simple task.
In addition to the arguments specified above, the Replica Exchange version of ROptimus also requires an input variable ACCRATIO, which is a vector that defines the acceptance ratios to be used for each of the replicas initiated, 12 in this case. Note that the length of ACCRATIO must always be equal to the argument NCPU.
Let us now briefly examine the evolution of the system pseudo temperature and acceptance ratio compliance in response to the adaptive thermoregulation. The following two graphs represent data from the last 20 000 steps of optimisation replica running on CPU 8 (fixed 34% target acceptance ratio).

Observed acceptance ratio (CPU 8 − 34% acceptance ratio)
Step Acceptance ratios (%) Notice that in the observed acceptance ratio graph, the dashed line indicating the target acceptance ratio is constant (as opposed to linearly changing as in acceptance ratio annealing). This is because each processor in the replica exchange mode has a single target acceptance ratio, as described above. Here again, adaptive decisions on the pseudo temperature to maintain the desired acceptance ratio result in non-monotonic, and non-uniform pseudo temperature adjustments, while the observed acceptance ratios fluctuate relatively closely around the set target value.

Summary
We now understand the input requirements to interface with the Acceptance Ratio Simulated Annealing and Replica Exchange versions of ROptimus. In this example, both versions retrieved solutions having a lower RMSD than the de-noised data, and only a slightly greater RMSD than the optimal least squares solution.
Replica Exchange resulted in a better solution than Simulated Annealing, at the cost of greater computing resources.

Tutorial 2: Finding an Optimal Equation by Navigating Through a Term Space Problem Statement
Consider again the synthetic data that was created in Tutorial 1. Suppose that we were only provided with the data and, unlike in Tutorial 1, had no knowledge of the best terms to be included in a functional representation of said data. In this example, we shall use ROptimus to determine which terms should be used in a least squares fitting of the data to achieve a representation with low RMSD while avoiding overfitting the data with too complicated equation.
Let us start by generating the same data which was used in Tutorial 1: From Tutorial 1, we know that if presented with this data and under the assumption that the most appropriate model to describe the data is We also know that the RMSD between the observed data y and the linear model fitting outcome is:

Defining ROptimus Inputs
Let us first define an ordered set terms that is a collection of candidate terms to include in the representation of the data: Let terms i denote the i th term in the set terms (for example, terms 14 = cos(x)). We shall use the model: where each k i is a binary variable (meaning a variable taking a value of either 0 or 1) indicating whether the i th term is included in the representation, each c i is a non-zero coefficient for the i th term and b is a real number (the intercept). In our case, card(terms) = 30 so explicitly, our model is: Formally, K will be a numeric vector of length card(terms) whose i th entry is k i . K uniquely specifies a set activeT erms = {terms i ∀ i |k i = 1}. Note that activeT erms ⊆ terms. Each binary variable k i should be initialized randomly as below: K <-c(term1=rbinom(n=1, size=1, prob=0.5), term2=rbinom(n=1, size=1, prob=0.5), term3=rbinom(n=1, size=1, prob=0.5), term4=rbinom(n=1, size=1, prob=0.5), term5=rbinom(n=1, size=1, prob=0.5), term6=rbinom(n=1, size=1, prob=0.5), term7=rbinom(n=1, size=1, prob=0.5), term8=rbinom(n=1, size=1, prob=0.5), term9=rbinom(n=1, size=1, prob=0.5), term10=rbinom(n=1, size=1, prob=0.5), term11=rbinom(n=1, size=1, prob=0.5), term12=rbinom(n=1, size=1, prob=0.5), term13=rbinom(n=1, size=1, prob=0.5), Next, we must define the model function m() that will operate on the parameter snapshot K and return an observable object O. For a given set activeT erms specified by K, m() will fit a linear model to the data using the entries in activeT erms and using the built in generalised linear model (glm()) function in R, thereby determining values for the variables c i and b. Accordingly, m() will require access to the variables x and y, which will be provided as entries in DATA, a variable of type list, as in Tutorial 1. The object O will be the corresponding output of the function glm(). In the case that the set activeT erms is the empty set (meaning that all entries in K are 0), m() will fit a model using the relationship y~x.
The target representation will be the fitted model M (whose terms are elements of terms) that minimises the AIC. It is important to note that the 2p term in the AIC penalises overfitting by increasing AIC as function of the number of parameters, while the −2ln(L) term rewards models that better represent the data by decreasing AIC as a function of the likelihood of the model.
As articulated in Tutorial 1, the output of u() should have a component E holding a pseudo energy for the parameter snapshot K, and a component Q that can be used for plotting the optimisation process. In this case, E will be equal to the value of AIC (implemented using the built in AIC() function in R) and Q will be equal to the RMSD between the predicted values of y from the fitted model and the actual y values, used for plotting purpose. Consequently, u() will need access to the variable y. The definition of u() is below: Finally, we need to define the rule function r(). We will adopt the following simple procedure: randomly select an equation entry from K and switch its value to the other binary value (on to off, or off to on).
Having defined all the necessary inputs, we are now ready to call Optimus().
An important remark is that modelling this problem in this manner results in an objective function (AIC) that is not smooth because small changes in the parameter set K (as defined by r()) can produce significantly large changes in the objective value. The equation, i.e. the system itself, changes from one step to another. Therefore, an entirely different model is being used to fit the data at each step. Optimisation procedures in such cases are in danger to be trapped in certain minima due to a major change in pseudo temperatures necessary to overcome barriers while the system abruptly evolves. Despite this, we will see that ROptimus will get the job done by arriving at good solutions largely as a consequence of its adaptive thermoregulation and acceptance-ratio-guided optimisation procedure.

Acceptance Ratio Simulated Annealing ROptimus Run
In addition to the inputs defined above, Optimus() can optionally take other inputs to dictate the optimisation process (see the Advanced User Manual), all of which have built in default values and some of which will be altered in this example due to the increased computational complexity of the model defined in this tutorial compared to that of Tutorial 1. The variable NUMITER represents the number of iterations of the optimisation process (per core) and has a default value of 1 000 000. For this example, 200 000 iterations will be used to reduce the running time of ROptimus given that each iteration is more computationally demanding than in Tutorial 1. The variable CYCLES (unique to Acceptance Ratio Annealing ROptimus runs) denotes the number of acceptance ratio annealing cycles. Its default value is 10, however it will be set to 2 in this example so that each annealing cycle has 100 000 iterations just as in Tutorial 1 (the number of steps per cycle is calculated as NUMITER/CYCLES). Lastly, the variable DUMP.FREQ, the frequency (in steps) with which the best found model is assessed and outputted by the function, will be set to 100 000 (its default value is 10 000).
Let us again investigate the Simulated Annealing (SA) version of ROptimus on 4 processors, which can be executed as follows: OPT.TYPE="SA", OPTNAME="term_4_SA", NUMITER=200000, CYCLES=2, DUMP.FREQ=100000, LONG=FALSE, K.INITIAL=K, rDEF=r, mDEF=m, uDEF=u, DATA=DATA) Interestingly, each of the 4 computing cores arrive at the same solution in this instance.  Thus, the optimal functional representation found by ROptimus has the following form: Below is the explicit representation after determining the coefficients c i and b: Notice that the solution selected by ROptimus results in an RMSD of 28.693 which is lower than the RMSD of the Least Squares Solution (28.827) that assumes the appropriate model is ROptimus selected a model which does not include all terms from the form used to generate the data. If a user were concerned by the fact that the model ROptimus selected contains more terms (6) than are used in the representation of the de-noised data (4), the user could either increase the multiplicative factor associated with the term p in the AIC to more strongly penalise representations involving a greater number of parameters. Alternatively, the user could also modify the function r() to ensure that only a fixed number of terms are ever active.
Let us now take a look at how the adaptive thermoregulation performed given this highly non-smooth objective. The graphs below should now feel very familiar, they represent data taken from the last 20 000 iterations of the optimisation protocol executed by CPU 1. Observed Acceptance Ratio (CPU 1) Step Acceptance ratios (%) Despite optimising a completely different model with a non-smooth objective function, the TCU succeeds in dynamically adjusting the system pseudo-temperature such that the observed acceptance ratio follows the annealing schedule rather well.

Acceptance Ratio Replica Exchange ROptimus Run
Let us now consider the Acceptance Ratio Replica Exchange version of ROptimus on 12 CPUs with the variable ACCRATIO defined as in Tutorial 1. 82,74,66,58,50,42,34,26,18,10,2) As in the Acceptance Ratio Simulated Annealing run above, we will again execute the optimisation procedure for 200 000 iterations. The Replica Exchange version of ROptimus takes an input argument EXCHANGE.FREQ (default value 1000) which specifies the total number of exchanges that will occur during the optimisation process. Consequently, the number of optimisation iterations that occur between subsequent exchanges between replicas can be calculated as NUMITER/EXCHANGE.FREQ, which is 200 iterations in this case.
Here we will set the input parameter STATWINDOW to have value 50 (its default value is 70). This signifies that the TCU will update the system pseudo temperature once every 50 iterations on each optimisation replica. This guarantees that 4 temperature adjustments will be made if a given replica is involved in two subsequent exchanges (because the number of iterations between exchanges is 200, as explained in the preceding paragraph) as opposed to merely 2 adjustments which would be the case if STATWINDOW were left to take its default value and could result in poor agreement between the observed acceptance ratio of the replica in question and the target acceptance ratio. The following line executes ROptimus with the above specified inputs: Optimus(NCPU=12, OPTNAME="term_12_RE", NUMITER=200000, STATWINDOW=50, DUMP.FREQ=100000, LONG=FALSE, OPT.TYPE="RE", ACCRATIO=ACCRATIO, K.INITIAL=K, rDEF=r, mDEF=m, uDEF=u, DATA=DATA) Nine of the optimisation replicas (CPUs 1, 2, 3, 5, 6, 8, 9, 10 and 12) recovered the same solution that was found by the Acceptance Ratio Simulated Annealing ROptimus run. Moreover, this solution is better (lower AIC) than those recovered by CPUs 4, 7 and 11. Thus, in this example, the Acceptance Ratio Simulated Annealing and Replica Exchange versions produce the same solution.

Replica Exchange ROptimus Fitting (12 Cores)
x y Please note that for convenience, only those replicas that produced a unique solution are listed in the table below (replicas 1, 2, 3, 6, 8, 9, 10 and 12 produced the same solution as replica 5; replica 11 produced the same solution as replica 7). Note that the various replica outcomes illustrate the penalising effects of the AIC on models using a greater number of parameters. Consider the solutions found by the 66% acceptance ratio replica and the 58% acceptance ratio replica (CPUs 4 and 5 respectively). Let y i denote the solution found by CPU i. Then, we have: Although the RMSD of y 4 , 28.6664, is lower than the RMSD of y 5 , 28.6932, y 5 has a lower value for AIC because it contains one less term than y 4 . Since AIC was specified as the objective metric, ROptimus (perhaps counterintuitively) selected y 5 as the more optimal solution to reduce overfitting. Observed Acceptance Ratio (CPU 5 − 58% Acceptance Ratio) Step Acceptance ratios (%) The above graphs are produced using data from the last 20 000 iterations of the 58% acceptance ratio replica (CPU 5). It is clear that the observed acceptance ratio more strongly oscillated around the target acceptance ratio than was the case in the Acceptance Ratio Simulated Annealing run from the previous part of this tutorial. More generally, it should be expected that the observed acceptance ratio fluctuates more significantly around the target acceptance ratio in Replica Exchange than in Simulated Annealing, especially when the objective function is non-smooth as is the case in this example. This is because an exchange between two replicas has the same effect as restarting a Monte Carlo optimisation from a random initial configuration with a temperature that very likely is not conducive to the target acceptance ratio for the given configuration. As such, each time an exchange occurs, significant deviations from the target acceptance ratio may occur and may require several STATWINDOWs for the TCU to correct. Despite this challenge, the TCU performs satisfactorily.

Summary
We now understand how to employ ROptimus to solve a more general problem than was addressed in Tutorial 1 and one with a non-smooth objective function. Additionally, we have a better understanding of the adaptive thermoregulation. Using the Akaike Information Criterion (AIC) as a metric with which to evaluate the performance of a candidate model, taking into account the desire to represent the data while avoiding to overfit the data, both the Acceptance Ratio Simulated Annealing and Replica Exchange versions of ROptimus recovered a better functional form to describe the data than the form which was assumed in Tutorial 1 (based on how the data had been generated), obviously with some overfitting to adapt to the noise while with the stringency used in this example.

Tutorial 3: Geometry Optimisation of Vitamin C Molecule Problem Statement
The focus of this tutorial is to depart from problem classes involving the search for functions to represent data, and demonstrate how ROptimus can be flexibly applied to arbitrary problem classes provided that they are formulated in accordance with ROptimus specifications. Additionally, this tutorial will illustrate that ROptimus can act as an optimisation kernel while calling external programs to execute a significant amount of the necessary computation for the optimisation process.
In this tutorial, ROptimus will be used, as an illustrative example, to 3D geometry optimise a molecular structure. Specifically, ROptimus will be used to determine the optimal values of two dihedral angles in the L-ascorbic acid (Vitamin C) molecule such that the molecule is in its ground state energy conformation. Vitamin C was selected to be the studied molecule because it has more than one freely rotating carbon-carbon bond and the potential for intramolecular hydrogen bonding due to the presence of multiple hydroxyl groups.
Moreover, Vitamin C is not a particularly large molecule. Due to these circumstances, Vitamin C can serve as a non-trivial case (as opposed to simpler molecules like ethane for instance), but one that does not require several days or weeks of calculations to arrive to optimal solutions (the optimisation procedures below took roughly 14-18 hours to terminate). This is the molecular structure of Vitamin C, with the numbering of non-hydrogen atoms provided from the scheme used in geometry specification: The major geometric features that drive the overall state of the molecule are the two C-C bonds in this structure: the bond joining carbon 3 and 6, and the bond joining carbon 6 and 10. The ground state conformation of Vitamin C will likely be a conformation such that steric clashes are minimised while also allowing for close proximity and right orientation between hydrogen bond donor and acceptor atoms. In the following sections, we formalise this optimisation problem and use ROptimus to arrive at the solution.

Defining ROptimus Inputs
As in the previous tutorials, we must first rigorously define the parameters that we are optimising. Let us begin by defining a dihedral angle as it will be used to our molecular geometry: a dihedral angle is the angle between two intersecting planes, where each plane is specified by 3 atoms of which 2 are common between both planes. Thus, a total of 4 atoms are needed to specify a dihedral angle. The conformation of Vitamin C with respect to its two freely rotating C-C bonds can be specified via two dihedral angles. Let ψ be the dihedral angle defined by the atoms numbered 1, 3, 6 and 7 and let φ be the dihedral angle defined by the atoms numbered 7, 6, 10 and 11. Having defined these two angles, we can now define the parameter set K as a numeric vector of length 2 whose entries are ψ and φ. We will arbitrarily initialise ψ and φ to have value 180. The corresponding Vitamin C conformation is illustrated below using a 3D structure and Newman projections along the two rotatable carbon-carbon bonds: In the 3D structure, grey denotes carbon, red denotes oxygen and white denotes hydrogen atoms.
K <-c(PHI=180, PSI=180) Now we will specify a model function m() that will operate on K. Starting from an arbitrary molecular conformation, altering the value of K will likely cause certain clashes or non-optimal interactions between atoms in the molecule that are not used in the definition of the angles ψ and φ. As such, after receiving an input set of parameters K, m() will have to alter the 3D location of constituents atoms while holding K fixed to arrive at the most stable geometry for the input K. Here, unlike in previous tutorials, to accomplish this task m() will call an external program MOPAC. MOPAC is a program for semiempirical quantum mechanics (QM) calculations, and can perform constrained and unconstrained geometry optimisations to arrive at a stationary state (note that calling MOPAC for a single initial geometry instance does not guarantee a global minimum will be found). MOPAC takes as input the specification of an initial molecular geometry in addition to an indication of which molecules the program is able to displace (or angles it can alter) and outputs a nearby local minimum molecular conformation with its corresponding energy in kcal/mol. For this optimisation problem, the input to MOPAC will be structured as a Z matrix, a common form for describing a molecular conformation which consists of using lengths, angles and dihedral angles with respect to previously defined atoms to define new atoms in the conformation.
The function m() will construct a Z matrix for Vitamin C using the input dihedral angles K and default values for the remaining geometric relationships needed to define the molecule. m() will then call MOPAC with the newly constructed Z matrix, specifying that all relationships may be altered by QM optimisation, except the input dihedral angles K. Finally, m() will return the energy calculated by MOPAC via PM6 Hamiltonian.

return(O) # heat of formation in kcal/mol }
Next, we define the function u() which returns an energy E and a quality Q of the candidate solution. Since the m() will already output a value for the physical energy of the candidate Vitamin C conformation, u() can simply set E to be the same return value of m(). We will make u() set Q to be the negative of the return value of m() such that candidate conformations with lower energies produce higher values of quality Q. Again, although u() does not require any additional data to accomplish this functionality, it must nevertheless be written to optionally accept an input parameter DATA.
Finally, we define the alteration function r(). r() will randomly select either ψ or φ to alter. Thereafter, r() randomly increases or decreases the selected angle by 2 degrees. r() will also ensure that ψ, φ ∈ [−180.0, 180.0] throughout the optimisation process.  The process of determining the energy of a conformation corresponding to a given set of angles ψ, φ is the most computationally intensive part of this optimisation formulation. Having defined the necessary inputs for Optimus(), it should be apparent that this calculation will entirely be handled by MOPAC. This ability to serve as an optimisation kernel and flexibly be knitted to an external program is one of the many strengths of ROptimus.

Defining a Benchmark Solution
Before calling Optimus(), we have established a benchmark solution to be used to independently evaluate the ability of ROptimus to arrive to correct ψ and φ combination. In order to explore the energy landscape associated with the parameter space of ψ and φ, a PM6 optimisation was performed on 10 conformers picked from the wells pf a more comprehensive potential energy surface scanning through MM2 molecular mechanics force field (the details of PM6 and MM2 are not important for the purposes of this tutorial). This resulted in the identification of 7 energy minima, shown in the table below (listed in increasing order by energy): We will assume that the above listed conformations comprehensively represent most, if not all, of the possible minima in the parameter space of ψ and φ. Under this assumption, Conformation 1 should be considered as the ground state conformation of Vitamin C. The accuracy of the results produced by ROptimus can thus be judged by comparing them to the data listed in the above table. It is important to recognize that the "resolution" of ψ and φ when being optimised by ROptimus is set to 2 degrees due to the manner in which r() was defined. As such, results produced by ROptimus that are within plus or minus 2 degrees of a reference conformation should be tolerated.

Acceptance Ratio Simulated Annealing ROptimus Run
For the Acceptance Ratio Annealing run, we will set NUMITER to 100 000 because each optimsation step is more costly due to the relatively computationally expensive calls to MOPAC. Moreover, we will set CYCLES to 2. Although this shortens the length of an annealing cycle to 50 000 steps (whereas 100 000 steps per cycle has been kept constant over the previous tutorials), having more than 1 annealing cycle is likely more beneficial than insisting on a cycle lasting 100 000 steps as opposed to 50 000.
Optimus(NCPU=4, OPTNAME="vitamin_4_SA", NUMITER=100000, CYCLES=2, DUMP.FREQ=50000, LONG=FALSE, OPT.TYPE="SA", K.INITIAL=K, rDEF=r, mDEF=m, uDEF=u, DATA=NULL) This conformation is equivalent to benchmark Conformation 2. Thus, in this example, Acceptance Ratio Simulated Annealing was able to find the Vitamin C conformation with the second lowest energy in the parameter space. This performance is strong, especially given that the limited steps and cycles executed, and that the energy difference between Conformation 1 and Conformation 2 is only −0.329 kcal/mol.
The graphs below illustrate the system psuedo temperature and observed acceptance ratio for the last 20 000 optimisation iterations executed by CPU 3.

Acceptance Ratio Replica Exchange ROptimus Run
Let us now consider the Replica Exchange version of ROptimus on 12 processors with the variable ACCRATIO defined as in the previous tutorials.
ACCRATIO <-c(90, 82, 74, 66, 58, 50, 42, 34, 26, 18, 10, 2) Just as in the Acceptance Ratio Simulated Annealing run, we will set NUMITER to 100 000. Furthermore, we will set EXCHANGE.FREQ to 500 such that the number of iterations between subsequent exchanges between replicas is 200 as it was in Tutorial 2. For the same reasons as in Tutorial 2, we will set STATWINDOW to 50 for the Replica Exchange run.
Optimus(NCPU=12, OPTNAME="vitamin_12_RE", NUMITER=100000, EXCHANGE.FREQ=500, STATWINDOW=50, DUMP.FREQ=50000, LONG=FALSE, OPT.TYPE="RE", ACCRATIO=ACCRATIO, K.INITIAL=K, rDEF=r, mDEF=m, uDEF=u, DATA=NULL) If we compare the solution found by CPU 4 to benchmark Conformation 1, it is evident that the value for φ found by ROptimus lies slightly outside of the plus or minus 2 degree window that was discussed earlier. Contrarily, CPU 7 finds a solution {φ = 94, ψ = 74} which does lie strictly within the resolution window. Despite this, the solution of CPU 4 has a slightly lower energy (−233.1979) than the solution of CPU 7 (−233.1947) and so represents a better solution. Finally, notice that Replica 6 recovered the same conformation that was identified by Acceptance Ratio Simulated Annealing ROptimus.
The below graphs illustrate the system pseudo temperature and observed acceptance ratio for the first 20 000 optimisation iterations executed by CPU 4 (66% acceptance ratio replica).

Observed Acceptance Ratio (CPU 4 − 66% Acceptance Ratio)
Step Acceptance ratios (%) When the optimisation process is first initialised, it is very unlikely that the input initial temperature is conducive to the target acceptance ratio. As such, the adaptive thermoregulation alters the system pseudotemperature considerably and rapidly to align the observed acceptance ratio with the target acceptance ratio, as can be seen in the above two graphs. Moreover, as stated in the previous tutorial, an exchange between two replicas often has a similar effect of introducing a parameter configuration that is not conducive to the current system pseudo temperature, which necessitates significant temperature adjustments. Accordingly, sharp increases or decreases in the system pseudo temperature and significant changes in the value around which the observed acceptance ratio oscillates in the graph above likely indicate steps at which an exchange involving replica 4 occurred.

Summary
We are now familiar with how to structure a more complex optimisation problem, involving an external program, to be solved with ROptimus as a kernel. On the particular example of geometry optimisation here, we saw that the Simulated Annealing mode of ROptimus was able to find the second lowest local minimum (under restricted number of annealing cycles), while the Replica Exchange mode recovered the global energy minimum.

Tutorial 4: Exploring Coupled ODEs Modelling a Biological System Problem Statement
This Tutorial will demonstrate the use of ROptimus to address a problem from yet another problem class. We will employ ROptimus to recover the rate constants for a system of coupled ordinary differential equations (ODEs) modelling a biological pathway. Specifically, we will study a phosphorelay system from the high osmolarity glycerol (HOG) pathway in yeast. A phosphorelay system is a network involving multiple proteins in which, after an initial phosphorylation event using ATP (or an alternate phosphate donor), the phosphorylation and dephosphorylation events of proteins in the network proceed without further consumption of ATP (Klipp et al. 2009). The below diagram illustrates the phosphorelay system that will be studied in detail (Klipp et al. 2009): Under normal circumstances, the transmembrane protein Sln1, which is present as a dimer, autophosphorylates at a histidine residue (consuming ATP). The phosphate group is then transferred to an aspartate residue of Sln1. Thereafter, the phosphate is transferred to the protein Ypd1 and finally to the protein Ssk1. Ssk1 is continuously dephosphortylated to give an output signal. The signalling pathway is inhibited by an increase is osmolarity outside of the cell (Klipp et al. 2009). If we let A represent Sln1, B represent Ypd1, C represent Ssk1 and XP represent the phophorylated form of protein X, then the above network can be represented by the below schematic (Klipp et al. 2009): where each k i represents the rate constant for the relevant phosphorylation/dephosphorylation reaction.
The above graphic allows us to arrive at the following equations to describe the temporal behavior of the phosphorelay system: Moreover, under the generally accepted assumption that the degradation and production of proteins occurs on a time scale that far exceeds that of phosphorylation events, we have the following conservation relationships (Klipp et al. 2009 Given this model of the phosphorelay system, the question we desire to answer is as follows: given initial concentrations of the three proteins what are the values {k 1 , k 2 , k 3 , k 4 } that result in the proteins having the target concentrations at steady state when the system is allowed to equilibrate from the initial concentrations? This formulation assumes that no information is known about the rate constants and that initial and target concentrations can be determined experimentally, which is often the case in practice (Raue et al. 2013). The problem formulation could be altered depending on the information that is known or that can be determined experimentally.

Defining ROptimus Inputs
Having outlined how the behaviour of the phosphorelay system can be modelled using a system of differential equations, we can now proceed with defining input parameters for Optimus(). We will create a variable state that will be a numeric vector holding the names and initial concentrations of all species in the network. For this tutorial, we will choose Note that the units are arbitrary and that the total sum of units across this vector will remain constant throughout the simulation of the dynamics of the phosphorelay system.
state <-c(cA=100, cB=100, cC=100, cAP=0, cBP=0, cCP=0) Next, we will create a variable target which will be a numeric vector holding the names and target concentration of all species in the network. We will arbitrarily choose target values of target <-c(cA=90, cB=20, cC=70, cAP=10, cBP=80, cCP=30) In order to determine the steady state behavior of the ODE system, we will employ the function ode() from the R library deSolve (this function interfaces with the Fortran library typically used to solve systems of differential equations). This function requires as input a function that describes the dynamics of the ODE system. We will call this function model(). At a high level, model() will simply define the equations derived in the previous section that describe the network. It should contain equations that use the objects with the names specified within state above, and should have equations that assign the outcomes to new objects that have the same order and names as specified in state, but with "d" at the beginning (a more detailed description of the requirements of model() can be found in the R documentation of ode()).
DATA <-NULL DATA$state <-state DATA$target <-target DATA$model <-model We will make K be a numeric vector holding the set of rate constants {k 1 , k 2 , k 3 , k 4 }. We will (arbitrarily) initialize each rate constant to have value 1.0. The function m() will take as input the vector K of rate constants and the list DATA. It will return an object O that contains the concentrations of the six species in the network when the system is simulated from the initial state specified in DATA using the K rate constants for 10 time steps. Note that it is not necessarily guaranteed that the system will reach a steady state after 10 time steps; the number of time steps was chosen such that the optimisation procedure would terminate within 1-2 hours in this example. m() will call the function ode() from the library deSolve, so we must first ensure that deSolve is installed. Recall that the function u() must return an energy E and a quality Q of the candidate solution. Here, u() will set both E and Q to be the RMSD between the steady state concentrations of the network corresponding to the current set of rate constants K, as determined by m(), and the target concentrations. The final mandatory input to Optimus() that must be defined is the alteration function r(). Just as in Tutorial 1, for each snapshot of K, we shall randomly select one of its four coefficients, then either increment or decrement (chosen randomly) it by 0.0002, returning the altered set of coefficients. Since we are dealing with rate constants in this case, if ever r() were to make an entry in K negative, that entry will automatically be set to 0.

Exploring the System Dynamics
Before calling Optimus() to solve this problem, let us first simulate the system of ODEs from the chosen initial state using a few sets of arbitrary rate constants to become familiar with how the system evolves. The below graphs illustrate the evolution of the system for 50 time steps for the rate constants {k 1 = 1.0, k 2 = 1.0, k 3 = 1.0, k 4 = 1.0}:

Concentration of Phosphorylated Species as a function of Time Step
Time Step

Concentration [AP] [BP] [CP]
The table below summarises the initial and final concentrations of the various species when the system is simulated for 50 time steps using the rate constants {k 1 = 1.0, k 2 = 1.0, k 3 = 1.0, k 4 = 1.0}: If instead we use the set of rate constants {k 1 = 1.5, k 2 = 0.5, k 3 = 1.0, k 4 = 1.0}, the system evolves as follows:

Concentration of Phosphorylated Species as a function of Time Step
Time Step

Concentration [AP] [BP] [CP]
The table below summarizes the initial and final concentrations of the various species when the system is simulated for 50 time steps using the rate constants {k 1 = 1.5, k 2 = 0.5, k 3 = 1.0, k 4 = 1.0}:

Acceptance Ratio Simulated Annealing ROptimus Run
We will now call Acceptance Ratio Simulated Annealing ROptimus to solve our problem. Similarly to Tutorial 2, we will execute 200 000 optimisation iterations and perform 2 annealing cycles. We will set DUMP.FREQ to have a value of 100 000.

Concentration of Phosphorylated Species as a function of Time Step
Time Step

Concentration
The table below summarizes the initial and final concentrations of the various species when the system is simulated for 10 time steps using the rate constants {k 1 = 0.7974, k 2 = 0.3586, k 3 = 0.0128, k 4 = 2.3886}: . As alluded to earlier, it is possible that the system has not reached a steady state after 10 time steps, however the above graphs suggest that the concentrations after 10 steps are already extremely close to, if not equal to, steady state values. The optimisation process could be re-executed after increasing the value of the parameter span in the function m() to simulate the system for a larger number of time steps.

Concentration of Phosphorylated Species as a function of Time Step
Time Step

Concentration
The table below summarises the initial and final concentrations of the various protein species when the system is simulated for 10 time steps using the rate constants {k 1 = 0.9834, k 2 = 0.4424, k 3 = 0.0158, k 4 = 2.9492}:

Summary
We have seen how ROptimus can be employed to recover rate constants for a system of coupled ODEs that describes a biological pathway. Given an initial state of the system and a target state, both Acceptance Ratio Simulated Annealing (SA) ROptimus and Replica Exchange (RE) ROptimus found a set of rate constants that resulted in the desired system behaviour upon simulation of the system. In the current setup in this tutorial, RE slightly outperformed SA, both however are not directly comparable unless their allocated resources and times are equalised.

Tutorial 5: Constrained Shuffling of Genomic Contacts to Form a Control Set Problem Statement
Here, we shall consider a problem from the field of 3D genome organisation, to exemplify how ROptimus can optimise a specific constrained shuffling task given a certain set of conditions.
Mammalian genomes are packaged into a complex three-dimensional (3D) structure inside the nucleus, bringing DNA regions linearly near or far from each other into contacts.
Here, we take a set of 734 pairs of i and j 40-kilobase DNA regions that are known to be in contact inside a cell. Binning the DNA (e.g., a single chromosome) into 40-kb regions, each region is represented as a single integer that is equal to its end position divided by the length of the region, which is 40 kb. For instance, the 1st region, with start and end positions at the 1 st and 40000 th nucleotides, respectively, is denoted as 1 (40000 th base /40000 bases = 1). This simplifies the notation for a contact between two regions to a pair of positive integers as shown below:

TASK:
Take note that our set of contacts comply with the following criteria.
1. The set only contains unique, long-range contacts, with i and j regions always separated by greater than 2 megabases (Mb). In terms of the integer identifiers, the threshold is 50 (2M b/40kb = 50), such that (j − i) > 50.
2. i is set to be always less than j. 3. A region can take part in more than one contact (as in region 96 in rows 1 and 2 in the table above), given that the contacts it forms comply with the two aforementioned criteria.
The task herein is to take the DNA regions from the original set (shown above) and to shuffle them to produce a new set of contacts. This is useful for a research on genomic 3D contacts requiring good quality of negative controls that still have the constraints that the original true data possess. The challenge here is to maximise the number of new control contacts that we can devise and that are valid, meaning that they A) still satisfy the aforementioned 3 criteria, and B) are not present in the original set.

Defining ROptimus Inputs
We can solve this highly constrained shuffling problem using ROptimus by doing the following preparations.
Keeping the original order of i's in the loaded real set of contacts (referred hereafter as IJ_ORIG), we can randomly shuffle all j's in the set once, without replacement. The output will be the object K to be supplied to Optimus() for the optimisation.
K <-IJ_ORIG set.seed(840) # Shuffle all j s once K[,"j"] <-sample(x=K[,"j"], size=nrow(K), replace=FALSE) The m() function should then operate on the object K, which, to reiterate, is derived from the original set of contacts with all j's shuffled once (see above). It can also accept a supplementary object DATA that holds data necessary In the u() function, the error percentage O will be directly used as the pseudo energy E, since we do want to minimise this value. Meanwhile, the quality Q for the given snapshot of K, which we want to increase, can be the negative of O.
Finally, r(), which defines how object K will be altered at every step, takes two j's and then swaps them, without changing the order of their partner i's.

Acceptance Ratio Simulated Annealing ROptimus Run
We first use the Acceptance Ratio Simulated Annealing scheme in ROptimus to solve the problem. As in Tutorials 2 and 4, Optimus() is set to do 200 000 steps, 2 annealing cycles, and is to produce 4 independent replicas of the same simulation. Note that if additional data is required by m() or u(), that object should be assigned to DATA argument of Optimus(), otherwise DATA defaults to NULL.
The 4 new sets obtained as optimums from each of the 4 replicates of independent simulated annealing runs have the following error percentages:  Notably, in this setup, none of the 12 new sets of contacts from the replica exchange run are better than the sets produced by the Acceptance Ratio Simulated Annealing mode of ROptimus.

Summary
We have added yet another one to the diverse applications of ROptimus, this time by performing a constrained shuffling of pairs of positive integers that, in this case, represent contacts between DNA regions, to form a new set of control contacts. Additionally, ROptimus showed to be a platform, where certain restrictions or conditions can easily be incorporated to a given shuffling task, which usually is required for specific research objectives. The table below shows the % of erroneous contacts of the best outputs from the acceptance ratio annealing and replica exchange ROptimus runs, with the former producing the better set.

Advanced User Manual
This section will document all possible non-experimental input arguments to Optimus() and will describe the output format of ROptimus. The code snippet below is the first part of the definition of Optimus(), which lists all non-experimental input arguments. We will refer to input arguments without default values as mandatory input arguments and those with default values as optional (with the exception of K.INITIAL, which will be considered a mandatory input argument).

Mandatory Input Arguments
All mandatory input arguments have necessarily already been defined in the tutorials. However, we will reiterate these definitions in this section. Optimus() has four mandatory inputs, mDEF, uDEF, rDEF (referred to as m(), u() and r() respectively in the tutorials) and K.INITIAL. mDEF mDEF must be of type closure. mDEF should be a function designed to operate on the whole set of parameter snapshot K and return the corresponding observable object O. Please note, that the size and shape of K and O are not necessarily to match, depending on the nature of the model used. mDEF must necessarily take K and DATA as input variables, and it must necessarily operate on K to produce O (operating on DATA in optional, see Tutorial 3 for an illustration). uDEF uDEF must be of type closure. uDEF should be a function designed to evaluate the performance of a given snapshot of coefficients K. uDEF should necessarily take as inputs O (the output of mDEF) and the variable DATA. The output of uDEF should have two components, Q holding a single number of the quality of the K coefficients (also used for plotting), and E holding a (pseudo)energy for the given snapshot K. It is important that the returned (pseudo)energy value is lower for better performance/version of K, never vice versa. The Q component of the uDEF function output is only used for plotting the optimisation process, and, if desired, can just repeat the value of the E component. rDEF rDEF must be of type closure. rDEF should be a function that defines a rule by which the K coefficient vector is to be altered from one step to another. rDEF must accept K as an input and return an object equivalent to K, but with some alteration(s).

K.INITIAL
K.INITIAL is an object of any type, which stores the initial values for the parameter(s) to be optimised. The only requirement for K is that it should be something alterable via rDEF and something that influences the outcome of mDEF.

NUMITER
NUMITER is a variable of type double that is the number of steps of the optimisation process per processing core. It has a default value of 1 000 000.

STATWINDOW
STATWINDOW is a variable of type double that is the number of steps executed between subsequent temperature adjustments executed by the Temperature Control Unit. STATWINDOW is also the number of steps used to calculate the observed acceptance ratio). It has a default value of 70.

T.INI
T.INI is a variable of type double that represents the initial system pseudo temperature at the beginning of the optimisation procedure. It has a default value of 0.00001.

T.ADJSTEP
T.ADJSTEP is a variable of type double that represents the baseline temperature change step-size for temperature auto-adjustment. It has a default value of 0.000000005.

TSCLnum
TSCLnum is a variable of type double that indicates the maximum number of STATWINDOWS for which the observed acceptance ratio can sequentially be below or above the ideal acceptance ratio before T.ADJSTEP is scaled by T.SCALING. It has a default value of 2.

T.SCALING
T.SCALING is a variable of type double that represents the value by which T.ADJSTEP is scaled in accordance with the condition specified by TSCLnum. It has a default value of 3.

T.MIN
T.MIN is a variable of type double and is the value that the system pseudo temperature is automatically set to if at any time, the Temperature Control Unit attempts to make the pseudo temperature a negative value. It has a default value of 0.000000005.

T.DELTA
T.DELTA is a variable of type double. If after a STATWINDOW, the observed acceptance ratio is within T.DELTA of the ideal acceptance ratio, the Temperature Control Unit will make no change to the system pseudo temperature. It has a default value of 2.

DUMP.FREQ
DUMP.FREQ is a variable of type double. It is the frequency in steps of printing the best found model to the working directory. It has a default value of 10 000.

LIVEPLOT
LIVEPLOT is a variable of type logical that indicates whether the optimisation process will be plotted in a PDF file in the working directory. It has a default value of TRUE.

LIVEPLOT.FREQ
LIVEPLOT.FREQ is a variable of type double that indicates the frequency in steps of printing the optimisation process in a PDF (this variable is only relevant if LIVEPLOT = TRUE). It has a default value of 100 000.

PDFheight
PDFheight is a variable of type double that indicates the height of the PDF that is produced (if LIVEPLOT = TRUE). It has a default value of 29.

PDFwidth
PDFwidth is a variable of type double that indicates the width of the PDF that is produced (if LIVEPLOT = TRUE). It has a default value of 20.

NCPU
NCPU is a variable of type double that indicates the number of optimisation replicas to execute. If calling the Acceptance Ratio Replica Exchange mode of ROptimus, NCPU must be greater than 1. NCPU has a default value of 4.

LONG
LONG is a variable of type logical. If LONG = TRUE, a memory-friendly version of ROptimus will be activated (in anticipation of a long simulation), and only data from the optimal explored parameter configuration and the last 10 000 optimisation iterations will be stored. LONG has a default value of TRUE.

SEED
SEED is a variable of type double which sets the seed for the random number generator. It has a default value of 840. OPTNAME OPTNAME is a variable of type character that can be thought of as the name of the optimisation process. OPTNAME is used when creating the file names of the ROptimus output. OPTNAME = "" is the default value.

DATA
DATA is a variable of type list holding any additional data that must be accessed by mDEF and uDEF. The default value for DATA is NULL.

OPT.TYPE
OPT.TYPE is a variable of type character, which specifies the mode of ROptimus to execute and should always be equal to "SA" or "RE." If equal to "SA" (for Simulated Annealing), the Acceptance Ratio Simulated Annealing version of ROptimus will be executed. If equal to "RE" (for Replica Exchange), the Acceptance Ratio Replica Exchange version of ROptimus will be executed. The default value of OPT.TYPE is "SA."

DIR
DIR is a variable of type character, which specifies the directory to save the ROptimus outputs. The default value for DIR is "." that means the current directory while running ROptimus. starcore starcore is an experimental variable of type list, holding some parameters for in-lab starcore use only. For example, starcore can be c("MAXCOEFORDER"=4), which will specify the maximum coefficient order. The default value for starcore is NULL, i.e. not activated.

Optional Inputs Specific to Acceptance Ratio Simulated Annealing
The below optional input arguments only impact the Acceptance Ratio Simulated Annealing mode of ROptimus.

[OPTNAME][Core number]_model_ALL
This is the most important output file in that essentially all contents from the other output files is contained in this file. The *_model_ALL file is an R workspace that contains a variable OUTPUT of type list. For Acceptance Ratio Simulated Annealing, OUTPUT has 12 fields (numbered 1-12 below), while for Acceptance Ratio Replica Exchange, OUTPUT contains an additional 14 fields (numbered 13-26 below). Note that the additional fields of OUTPUT in Replica Exchange were included to facilitate the writing of the code; they can largely be ignored by the user. K.stored holds the optimal parameter configuration found by the given processor. O.stored holds the object O generated by mDEF from the optimal parameter configuration K.stored. STEP is a double that holds the current optimisation iteration number. PROB.VEC is a vector that holds the acceptance probability for each optimisation step. T.DE.FACTO is a vector that holds the system pseudo temperature during each optimisation iteration. IDEAL.ACC.VEC is a vector that holds the target acceptance ratio for each optimisation iteration in the case of Acceptance Ratio Simulated Annealing. In the case of Acceptance Ratio Replica Exchange, IDEAL.ACC.VEC holds the same value as the input variable ACCRATIO. ACC.VEC.DE.FACTO is a vector that holds the observed acceptance ratio at the end of each STATWINDOW. STEP4ACC.VEC.DE.FACTO is a vector that holds the optimisation step numbers that correspond to the end of a STATWINDOW. ENERGY.DE.FACTO is a vector that holds the actual system energy E at each optimisation step. Q.STRG is a vector that holds the system quality Q at each optimisation step. ACCEPTANCE is a vector of binary variables whose i th entry is 1 if the candidate parameter configuration was accepted on the i th optimisation step and 0 otherwise. STEP.STORED is a vector storing the number of each optimisation step.
E.stored is a double that stores the energy E associated with the optimal parameter configuration K.stored. E.old is a double that stores the energy E associated with the most recently considered parameter configuration. Q.old is a double that stores the quality Q associated with the most recently considered parameter configuration. K holds the most recently considered parameter configuration. T is a double that holds the current system temperature.
Step.stored is a double that holds the optimisation step number on which the optimal parameter configuration K.stored was discovered. ENERGY.TRIAL.VEC is a vector that holds the energy E of the candidate parameter configuration at every optimisation step. STEP.add is a double that facilitates indexing into output vectors. NumofAccRatSMIdeal is a double that represents the number of times the observed acceptance ratio has sequentially been smaller than the target acceptance ratio. NumofAccRatGRIdeal is a double that represents the number of times the observed acceptance ratio has been sequentially greater than the target acceptance ratio. t.adjstep is a double holding the current value by which the system pseudo temperature is increased or decreased each STATWINDOW by the Temperature Control Unit. AccR.category is a character that indicates whether the observed acceptance ratio was above or below the target acceptance ratio during the previous STATWINDOW. new.T.INI is a double that stores an estimate for the ideal initial system pseudo temperature deduced by the Temperature Control Unit. Finally, instanceOFswitch is a double that tracks the number of times the observed acceptance ratio has transitioned from being less than the target ratio to greater than the target ratio or vice versa.

[OPTNAME][Core number]_model_K
The *_model_K file is an R workspace that contains a variable K.stored of the same type as K.INITIAL. It holds the optimal parameter configuration found by the given processor.

[OPTNAME][Core number]_model_O
The *_model_O file is an R workspace that contains a variable O.stored that holds the object O generated by mDEF from the optimal parameter configuration K.stored.

[OPTNAME][Core number]_model_QE
The *_model_QE file is a text file that stores the values of E and Q that are produced from the optimal parameter configuration K.stored and, in the case of the Acceptance Ratio Replica Exchange mode of ROptimus, stores the target acceptance ratio associated with the given replica.

[OPTNAME][Core number]
The * file is a pdf file that includes 5 plots: 1) acceptance probability as a function of optimisation step; 2) system psuedo temperature as a function of optimisation step; 3) observed (red solid line) and target (black dashed line) acceptance ratio as a function of optimisation step; 4) energy E as a function of optimisation step; 5) quality Q as a function of optimisation step.